the blueprint for process performance and optimisation  

home | company | products | solutions | support | contact    

newsletters | downloads | TAPtraining | tips | FAQforum

 
  - Plastics / Rubber Producer
-
Petrochemical Producer
-
Pharmaceutical Producer
-
Semiconductor Producer
-
Food Producer

- Paper Producer
-
Glass Producer
-
Cement Producer
-
Chemical Producer
-
Refinery
 


 

 

  World Pharma IT Congress
27-28 September 2004.
London, UK

Instrumentation Scotland & Offshore Systems
8-9 September 2004.
Aberdeen, UK

 

 

 

 
North American Agent



Stochos Incorporated

 

 

 

FAQ's

How do I use dates?
What format is best for importing data?
How much data should I collect?
What data should I collect?
Can I add a new variable?
What is a saved state?
What happens to bad data?

 

How do I use dates ?

CVE considers that any field containing non numeric characters to be invalid. Therefore to use dates within CVE it should be changed to number format in Excel before saving as a csv file. Excel calculates dates by counting days from 1/1/1900, the decimal places represent the hours, minutes and seconds. If necessary the decimal part may be split into these groups.
 

What format is best for importing data ?

We recommend you use the comma separated variable or CSV format. CVE also accepts data as space separated and tab delimited variables. There is also a native data format which may be identified by the .dat file extension. The .csv file format is compatible with all spreadsheets and most importantly preserves missing data which may have been lost in a space or tab delimited file.
 

How much data should I collect?

Due to the complexity of statistical analysis, data collection has been a difficult task. Analysts have had to be selective when starting new projects to ensure that the data collected is relevant to the research. CVE has removed these restrictions opening the way for much larger data sets and larger problems to be analysed. One will never appreciate the true power of CVE if only investigating a few variables. That is not to say that CVE is not a capable or powerful tool for this level of data, It is much better when dealing with large projects.

  • Consider looking at systems rather than the subsystem you suspect is causing the problem.

  • Including information from variables not directly connected to a problem permits one to see the wider picture and very rapidly identify unexpected variables.

  • 30 - 40 variables can easily be plotted and many more are easily managed. Including as many variables as possible at the start of the project will make future analysis work easier and quicker.

The permutation and variable management tools within CVE make it simple to hide and reorder variables to only show those that appear useful.


What data should I collect?

The answer to this question depends largely upon the type of process and timescale.

For example consider a gas turbine. The residence time of fuel gas in the turbine is very short, just a fraction of a second, but the run time is very long, up to a year between shutdowns. To investigate reduced performance due to blade fowling one might look at weekly data for a few years. When investigating NOx emissions 30 second data may be more appropriate. Whatever the period chosen one should aim for 1000 to 5000 measures. This is more than sufficient to give a solid impression of the operating characteristic. Using less data will produce excellent results; as little as 100 rows can produce some startling conclusions. When the data available is less than 100 rows there is still plenty of information available, but extreme or unusual events can sometimes be given more weight than they might warrant.

When collecting data consider the number of variables you have chosen. When plotting a graph in two dimensions just two points will show a line, add another point and that line might appear as a curve, add another and the line might be straight but with definable error. Add another and the error becomes clearer, each time you add a line the level of information in the graph grows, but the significance of each individual point diminishes. Remember that a 25 variable parallel coordinate plot is showing in that one view the equivalent of 300 Cartesian plots, so more data points are required.
 

Can I add a new variable ?

Yes, variables may be added which are a function of other variables contained within the data set. The variables used to define a new variable do not need to be visible in the parallel plot. To create a new variable, select the expression variable from the variables menu, change the label and enter the algebraic expression for the variable definition.

CVE also permits you to add other forms of variable, including index and clustering.
 

What is a saved state ?

A saved state is a record of where in the analysis you have reached and the steps taken to get there. it s very useful for pausing and restarting, or storing a position when there are multiple analysis options. For more information read the tip on saving your current position
 

What happens to bad data ?

Bad data, missing values and non-numeric are handled in exactly the same way. By default CVE gives each a value of 5% below the minimum for that variable; this make the variable easily identifiable. You may select these points to be removed completely or set them to a different value.